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“POWER” VS. “SPEED” IN ARMY ALPHA 
G. M. RUCH 


and 


WILHELMINE KOERTH 
State University of Iowa 


Introduction.—The recent controversial literature on intelligence 
testing has raised anew certain questions concerning the validity of 
so-called speed tests for purposes of mental measurement. The 
experimental evidence bearing on the general question of speed versus 
power tests is unfortunately very meager. The most important 
study hitherto reported was one carried out during the war by Dr. 
Mark A. May under the direction of Dr. Lewis M. Terman in which the 
correlation was computed between scores earned on Army Alpha during 
regular time limits with those earned during double time limits. 
The cases involved numbered 510 and the coefficient of correlation 
was found to be 0.965. The conclusion drawn from this investigation, 
in the words of the report,! was that “In general, then, we have no 
reason to assume that an extension of time limits would have improved 
the test or have given an opportunity to many individuals materially 
to alter their ratings.’”’ As the writers of the report point out, this 
result is to be interpreted as indicating that, although the absolute 
scores are admittedly raised by the extra time allowance, the rank 
orders of the 500 men are not markedly changed by the additional 
time. Further reference will be made to this army study after the 
new data have been presented. 

Psychologists have been rather generally agreed that time limits, 
if not too strictly drawn, do not work an undue hardship on the 
subjects taking mental tests in the great majority of cases. The fact 
that very few crucial experiments have actually been carried out does 


1 Psychological Examining in the United States Army. Memoirs of the National 
Academy of Sciences, Vol. XV, 1921, p. 416. 
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not necessarily imply that there are insufficient grounds for this belief. 
The evidence is, however, of a more indirect nature than one might 
wish. It rests on a wide variety of fairly well established facts such as: 

1. The fact that there has been shown to exist moderately high, or 
high, correlation between speed and accuracy in tests of mental 
capacities like arithmetical abilities, naming of opposites, 
substitution tests, etc. 

2. The fact that repetitions of the same tests give fairly uniform 
results; which, in other words, means that they show a satis- 
factory amount of self-correlation or reliability. 

3. The fact that under practice individuals tend to gain, from 
period to period, in a consistent manner so that there exists 
high correlation between initial and subsequent performances. 

4. And finally, the fact that the time limits for all carefully stand- 
ardized tests have been experimentally determined in such a 
way as to guarantee that the majority of subjects do have time 

to complete all or nearly all of the test items which are within 
their capacities. 

Such evidences are real even if admittedly too indirect to convince 
everyone. It is with the hope that additional direct evidence might 
be brought forward that the authors have undertaken to extend 
and otherwise supplement the previously reported work with the 
Army Intelligence Examination Alpha. The present study has also 
its shortcomings which will be pointed out as the occasions arise in 
the discussion. 

The General Nature of the Investigation—wWith the cooperation 
of the Dean of the College of Liberal Arts of the State University of 
Iowa, the writers called in for the purposes of this experiment 122 
freshmen who had previously taken the regular entrance intelligence 
examination during October, 1922. This group was made up as 
follows: 

(a) Seventy students who earned percentile scores falling in the 
lowest decile of the total distribution of scores, 7.e., between 
the first and tenth percentiles. 

(b) Fifty-two students who earned percentile scores falling within 
the highest decile of the total distribution of scores, 7.e., 
between the ninetieth and one-hundredth percentiles. 

This group of 122 students are quite representative of the greatest 

extremes of talent of the entire freshman class, although, of course, 
constituting a highly selected group in comparison with an unselected 
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adult population. The sexes are approximately equally represented. 
The percentile ranks were based upon the combined scores of four 
intelligence tests as follows: the Thorndike Intelligence Examination 
for College Entrance, Part I (two forms and the practice form), 
Morgan’s Test of Mental Ability, and the Iowa Comprehension Test. 
The total working time for this battery of tests is somewhat more than 
two hours. The actual scores of the groups used in the investigation 
are not reported here because the previous test scores were only used 
to select the two groups already described. These groups will here- 
after be referred to as the “high” and “‘low” groups. 

The detailed description of the experiment follows. The students 
were requested to report to one of the large assembly rooms at 3 P. M. 
and at that time a brief explanation of the purpose of the meeting was 
given by the Dean of the College of Liberal Arts. They were simply 
told that at the time of the Fall examination some of them complained 
that their scores were too low because they were not given sufficient 
time to complete the tests, and that feeling that there might be some 
truth in their belief, we had decided to give them a second chance on 
another test in which they would be given all the time that they cared 
to use. They were instructed further not to surrender their papers 
until they were absolutely satisfied that no further amount of time 
would raise their scores by as much as one point. Every effort was 
made to induce every subject to attempt each item in the entire test. 
This was not quite realized because in a very few cases students 
refused to record their guesses and attempts on items that they felt 
were entirely too difficult for them. The effect of these instructions 
was to reassure any nervous subjects and to create a good working 
spirit. 

The three stages of the experiment will be described in succession. 

Period 1—The entire group was given Form 7 of Army Alpha 
under the strict procedure of the Examiner’s Manual. During this 
period of the test all of the subjects were required to work with ordi- 
nary black-leaded pencils. No variations from the standard procedure 
were allowed in order that the results would be strictly valid for com- 
parison with the norms for Army Alpha. At the end of the test the 
black pencils were collected. 

Period 2.—Blue-leaded pencils were passed out to all of the sub- 
jects, who were then told that the test would be repeated under 
the same time limits and that they could make any corrections or 
additions that they wished. The further order was given that no 
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erasures were to be made in the black pencil responses. Corrections 
could and must be made by merely indicating the correction by 
placing the blue pencil mark where it belonged, allowing the old 
record to stand. The students were assured that the blue responses 
would be given precedence over the black ones in all cases. This 
procedure made it possible to trace out exactly in which period each 
right and each wrong response was made. Scores for single and 
double time could thus be separately computed. The regular tech- 
nique of Alpha was followed in the second period except that the 
instructions for certain of the tests were slightly abbreviated. At 
the end of this period the blue pencils were collected and a few minutes 
intermission given. 

Period 3.—The subjects were given red-leaded pencils for their 
further work. During the intermission the booklets of all of the 
subjects who indicated that they had completed all of the work 
within their capacities were examined. In case all of the items had 
been attempted and it seemed likely that the subject had really 
“‘worked himself out,’’ he was permitted to surrender his booklet 
and leave the examination. However, even under these circum- 
stances, each subject was urged to continue longer. A considerable 
number of the ‘“‘high” group could not be prevailed upon to spend 
more time and were consequently excused. Those remaining for the 
third period were then informed that they might take as much time to 
finish as they still needed. They were told to go from test to test 
without directions from the examiner, perfecting those portions of the 
test that were not yet completed. In the third period, the instructions 
for Test 1 were read once more, this making the final reading. In 
the case of this test the regular time limits were again observed. 
During the third period the subjects were excused as they finished, 
the usual inspection of the booklet being adhered to. The 
total working time for this period was indicated upon the test 
booklet. 

As has been suggested, the technique used allowed the computation 
of the scores earned in single, double, and unlimited time, separately. 
It should be pointed out here that, due to the fact that some of the 
most rapid workers finished in double time, the scores of such individ- 
uals are identical for double and unlimited times. This really amounts 
to self-correlation in these cases, but this can be defended upon the 
basis that the identity of the two sets of scores is real evidence that 
power scores are actually being considered. 
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The examination began at 3 P. M. and the slowest subject finished 


his test at 5.50 P. M., using a total working time equal to approxi- 


mately 414 times the regular time limits. 
Statement of the Results ——The scores for single, double, and unlim- 
ited time were tabulated separately for purposes of statistical analysis. 




























































































DOUBLE TIME 
196 e Py 
eee 
165 ° eee 
j cece eece 
} re, e © jeecee 
175 966 °° Pct 
168+ e — e 
ee 
166f 7 ee ° e 
ee ee 
145T x x o ee © 
3 136f 
ol 
= ' x 
“4 136 x x x bd 
4 : Oa | X X 
F 138 * | 00 | x x 
| ax 
406 * | 00 oe - 
ost e:¢ xx| x 
xx] x 
86 x x| xx xxix x x 
7st x [xx] xx] x 
est x | x * "High" Group 
X "Low" Group 
56, x 














75 86 95 305 135 i125 1355 145 155 165 i175 185 195 3205 
Fig.’ 


The important results are set forth in the following series of figures 
and tables. Figure 1 shows the scatter diagram for the correlation 
of the scores earned in single and double time. Figure 2 does the 
same for single time and unlimited time. The scores of the high group 
are shown by the dots and those of the low group by the crosses. It 
will be noted that there is practically no overlapping, a result which 
would be expected in view of the fact that the two extreme deciles are 
alone concerned. 
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The correlation for the total scores earned in single time with the 
total scores earned in double time. 


The correlation for the total scores earned in single time with the 
total scores earned in unlimited time. 


The correlations shown in Figures 1 and 2 are given in Table I. 
The Pearson product-moment formula was used. 
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TABLE I 
r PE N 
Single time with double time.................. 0.966 0.004 122 
Double time with unlimited time.............. 0.945 0.007 122 


Since the general appearance of Figure 2 suggests that the distri- 
bution is not quite rectilinear, the correlation ratios (n) were also com- 
puted. These are 0.967 and 0.965 (uncorrected for errors due to 











invt 


g) 
g 
d 
: 
t 
C 
t 
g 
i 
f 
1 
1 





id 


“Power” vs. ‘‘ Speed” in Army Alpha 199 


grouping). The relationship is probably somewhat closer than that 
suggested by the product-moment coefficient of correlation. The 
agreement between the correlation of single and double time with 
that reported in the Army figures is striking. The range of talent 
involved in the two investigations is approximately equal (Table II) 


TaB.eE II 
ARMY Iowa 

EXPERIMENT EXPERIMENT 
I ss, as.ns cds debe bapneht is 8t.cdawondibs 62.0 127.6 
a in nena dstinnnh te ebsnteevadadben 80.5 149.6 
Mean, unlimited time.................. fons beeen eee 155.0 
A I a ooo bc oo 6k ow cs nbvscdewscesteas 18.5 22.0 
es  cnealensccetadessme eaedin 5.4 
IEEE, OPO PTET Te nL ree 27.4 
Standard deviation, single time....................... 35.0 38.2 
Standard deviation, double time...................... 42.2 34.4 
Standard deviation, unlimited time................... «2... 31.9 
Coefficient of variation, single time.................... 0.565 0.299 
Coefficient of variation, double time................... 0.524 0.230 
Coefficient of variation, unlimited time.................  ...... 0.206 


although the general mental level of the Iowa group is greatly above 
that of the 510 soldiers, the means being 127.6 and 62.0, respectively, 
for single time. The most important difference between the two 
groups seems to lie in the fact that the variability of the Army 
group increased under added time while that of the college group 
decreased, probably due to the fact that many of the high college 
group were drawing near to perfect scores and hence had less oppor- 
tunity to gain than was the case with the low college group. This 
decreasing variability of the college group is a rather general tendency 
throughout all of our data and operated as a serious limitation to the 
adequacy of the present technique. Even with this constantly increas- 
ing curtailment of the range of scores at the upper end of the college 
group, it is to be noted that the Iowa group gained more (22.0 points) 
than the Army group (18.5 points) in double time over single time. If 
these figures are trustworthy, it would appear that the more intelligent 
subjects gain more than the less intelligent subjects, the college group 
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total scores earned in double time. 


The correlation for the total scores earned in single time with the 


total scores earned in unlimited time. 


The correlations shown in Figures 1 and 2 are given in Table I. 


The Pearson product-moment formula was used. 
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Since the general appearance of Figure 2 suggests that the distri- 
bution is not quite rectilinear, the correlation ratios (7) were also com- 
puted. These are 0.967 and 0.965 (uncorrected for errors due to 


Fig. 2 


TABLE I 
‘ 


PE 
.se---+- 0.966 0.004 122 
ila nto aan. ecaeat 0.945 0.007 122 








“Power” vs. ‘‘Speed”’ in Army Alpha 199 


grouping). The relationship is probably somewhat closer than that 
suggested by the product-moment coefficient of correlation. The 
agreement between the correlation of single and double time with 
that reported in the Army figures is striking. The range of talent 
involved in the two investigations is approximately equal (Table IT) 


TABLE II 
ARMY Iowa 

EXPERIMENT EXPERIMENT 
NG 4 5 5c 6 ste ROE Shdebs ods soRaee Nee. 62.0 127.6 
SE LEE IE ET TT Te LIE 80.5 149.6 
Mean, unlimited time.................. Nise bimadbakay damned 155.0 
I ie ecu aic cass ndnsogaeade seca 18.5 22.0 
OE ie Pier Pam Pee « 5.4 
Ns cok ewsicabavndsdedaeeads_eebain 27.4 
Standard deviation, single time....................... 35.0 38.2 
Standard deviation, double time...................... 42.2 34.4 
Standard deviation, unlimited time...................  ..... 31.9 
Coefficient of variation, single time.................... 0.565 0.299 
Coefficient of variation, double time................... 0.524 0.230 
Coefficient of variation, unlimited time.................  ...... 0.206 


although the general mental level of the Iowa group is greatly above 
that of the 510 soldiers, the means being 127.6 and 62.0, respectively, 
for single time. The most important difference between the two 
groups seems to lie in the fact that the variability of the Army 
group increased under added time while that of the college group 
decreased, probably due to the fact that many of the high college 
group were drawing near to perfect scores and hence had less oppor- 
tunity to gain than was the case with the low college group. This 
decreasing variability of the college group is a rather general tendency 
throughout all of our data and operated as a serious limitation to the 
adequacy of the present technique. Even with this constantly increas- 
ing curtailment of the range of scores at the upper end of the college 
group, it is to be noted that the Iowa group gained more (22.0 points) 
than the Army group (18.5 points) in double time over single time. If 
these figures are trustworthy, it would appear th~t the more intelligent 
subjects gain more than the less intelligent subjects, the college group 
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being considered as more intelligent than the Army men as is clearly 
evident from the mean scores. The curtailment of the scores of the 
college group at the upper end of the distribution has very probably 
operated also to decrease the correlations between single and double 
time, and again between single time and unlimited time. 

Figure 3 shows the relative gains of the high and low groups in the 
Iowa experiment. The enforced curtailment of the scores of the good 
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group under increased working time is clearly evident in the curves. 
The amount of overlapping remains approximately the same and the 
low group did not seriously threaten to overtake the high group 
although the lead of the latter was somewhat cut down. Further 
discussion of this point will be postponed until certain additional 
facts have been presented. 

The correlations between single time and double time, and single 
time and unlimited time, test by test, are given in Table III. The 
corresponding Army figures have also been included. 
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he TERT 
. . . . ingie time wl VF 
ly Single time with double time iestinainndh tae : 
le Test a4. ; 
| 
Iowa Army Iowa 
1e | vd ' 
rd ie 
1 | age aaa emo 0.775 i 
= 2 0.932 0.937 0.880 
3 0.455 0.879 0.271 
a 4 0.925 0.940 0.896 ; 
5 0.875 0.902 0.817 
™ 6 0.921 0.960 | 0.866 
7 0.905 0.920 0.886 . 

ra 8 0.945 0.910 0.919 ‘ 

- With the exception of Test 3 (Practical Judgment), the two } 
sets of coefficients are in remarkable agreement. Test 3 is undoubt- 
ably too easy for the college group and too much of a speed test to 
stand up under the conditions of the experiment. 

Table IV shows the means, standard deviations, and coefficients 


of variation for the Iowa group, test by test. The coefficients of ey 
variation, here as elsewhere, are the ratios between the standard . 
deviations and the means (Pearson’s method). 
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TABLE V 
ROE 1 2 3 4 5 6 7 8 
Double over single........ 1.76 1.93 4.01 2.36 2.24 2.31 4.31 2.72 
Unlimited over double.... 0.46 1.17 0.44 0.46 0.36 1.21 0.79 0.35 
Unlimited over single... .. 2.22 3.10 4.45 2.82 2.60 3.52 5.10 3.07 
Army results: 
Double over single...... .... 1.16 3.53 2.08 3.14 1.19 3.80 4.20 


These gains for the separate tests cannot be compared with much | 


meaning since the steps between the various items are of quite 
unequal difficulties from test to test. Likewise the gains in the 
several tests by the Iowa group cannot be compared directly with the 
Army group since the two groups were working at widely different 
levels on the total test. To gain one score point from 175 as a base 
may be considerably more difficult than to gain 1 point with a score 
of 75 asa base. Certain tests like No. 7 (Analogies) permitted rather 
marked gains to be made but Test 1 (Directions), Test 4 (Synonym- 


Antonym), and Test 5 (Disarranged Sentences) allowed compara- 


tively little gain from single to unlimited time. It may be concluded 
that with the exceptions of Test 2 (Arithmetical Problems) and Test 
6 (Number Series Completion) there would seem to be absolutely no 
justification for allowing more than double time at most. These are 
the two tests of the battery that involve mathematical abilities. 


In but one other case (Test 7, Analogies) did the gain of unlimited time | 


over double time exceed five-tenths of a point. 
Table VI gives the means, test by test, for the high and low groups 


separately, period by period. The maximum score for each test is 
given for comparison with the obtained scores. 



































Taste VI 
Single time Double time Unlimited time : 
Maximum 
Test |— — — - 
score 
High Low High Low High Low 

1 10.10 6.14 | 11.10 8.47 | 11.35 9.09 12.00 
2 14.10 8.64 | 16.79! 10.00} 17.85 | 10.27 20.00 
3 12.38 8.73 | 14.52 | 13.44); 14.63 | 14.37 16.00 
4 | 29.15 | 14.30 | 31.84 | 16.29; 31.96 | 16.61 40.00 
5 21.00 | 12.21 | 22.52; 16.97 | 22.61 17.41 24.00 
6 15.50 8.30 | 18.21 10.44 | 18.75 | 12.02 20.00 
7 36.58 | 19.21 | 38.54 | 25.01 | 38.54 | 26.39 40.00 
8 30.29 | 19.19 | 31.46 | 23.06 | 31.46 | 23.30 40.00 
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Table VII shows the gains in the mean scores for the two groups 


computed from Table VI. 
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Table VIII shows the standard deviations of the scores of the 


high and low groups, test by test, and period by period. 
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Table IX shows the coefficients of variation computed from the 


data given in Tables VI and VIII. 
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TaBLe IX 
| 
Single time Double time | Unlimited time 
Test er ——_|_———__—_-- : 
High | Low High Low High Low 
| | 
| | 
1 0.124 0.314 0.081 0.231 | 0.071 0.212 
2 0.173 0.213 0.136 0.229 0.087 0.253 
3 0.176 0.175 0.089 0.116 0.079 0.081 
4 0.188 0.384 0.168 0.388 0.164 0.380 
5 0.126 0.432 0.089 0.304 0.089 0.220 
6 0.126 0.282 0.078 0.295 0.062 0.289 
7 0.081 0.434 | 0.046 0.366 0.046 0.322 
s 0.142 | 0.276 0.122 0.184 0.122 | 0.180 








Examination of the relative scores of the two groups tabulated 
in Tables VI, VII, VIII and IX reveals several interesting differences. 
Figures 4 and 5 which follow show the same facts in terms of percent- 
age distributions of the scores for the two groups in the separate 
tests for single and unlimited time. Double time is not shown in 
these graphs. 

In the first place, in almost every test there is good evidence that 
the high group suffered under the handicap of being forced to work at 
the upper part of the test series where there was little opportunity for 
gain after the single time period. The scores of the high group are 
crowding the maximum possible scores so closely that there results 
marked curtailment of the distributions in almost every test. This 
shows in the Table of Means, in the Tables of Gains, and in the Tables 
of Variability with great consistency. The low group, except in one 
or two tests, suffered under no such limitations. Their variabilities 
and gains were almost uniformly greater than those of the high group. 
In no case is the high group as consistently variable as the low group. 
In two of the tests, Test 2 (Arithmetical Problems) and Test 4 
(Synonym-Antonym), the high group improved more than the low 
group as a result of the added time allowances. In all other tests 
the low group improved more. However, in no test did the low group 
ever reach the level of performance of the high group with the possible 
exception of Test 3 (Practical Judgment) which, as has already been 
suggested, appears to be far too easy to have much value with superior 
adults except when administered as a speed test. 
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That the low group gained more than the high group is to be expected 
in view of the limited possibilities of the former in earning additional 
points. The Army results were exactly opposed in this respect when 
increases in absolute score points are considered, although, as the Army 
report! points out, in terms of percentages, the poorer subjects gained 
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most (a fact which is a mathematical function of the differences in 
the bases from which the percentages are computed). 

Summary and Conclusions.—It has become increasingly evident 
in the preceding discussion that Army Examination Alpha is not en- 
tirely satisfactory as an instrument for studying the factors of speed 


1 See Table 74, p. 416, op. cit. 
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versus power, at least with intelligent adults. The reason for the choice 
of Alpha for this investigation was principally that of verification of 


the work reported in Psychological Examining in the United States 
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Army. Our results verify the Army work in all important conclusions 
as far as the conditions are strictly comparable. 

For use with college students a test with more “‘top”’ is demanded. 
The Thorndike Intelligence Examination for College Entrance would 
probably prove more satisfactoy for work with adults. Neverthe- 
less, there is one very good reason for the use of Army Alpha in the 
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present study that should be emphasized, viz., that Alpha is probably 
typical of what we term speed tests. The Thorndike Test places 
much more of a premium on power, or long-sustained effort. One of 
the writers (G. M. R.) is at present engaged upon further studies of 
this general problem with other group tests of intelligence and educa- 
tional abilities, using children young enough to obviate the possibility 
of approach to perfect scores even with the most intelligent subjects. 
Under such conditions it is thought likely that the bright subjects 
might be found to improve more, not less, than the duller ones. This 
possibility is strongly suggested by the earlier work of Thorndike and 
his students on learning and by more recent but as yet unpublished 
studies of one of the authors (G. M. R.) upon the learning of children 
of the same chronological ages but widely differing intelligence 
quotients. 

With this general limitation in mind and properly allowed for in 
interpreting the results of the present study, the following conclusions 
are suggested as being indicated by our data: 

1. Admitting that Army Alpha is largely a speed test, the fact that 
single time correlates 0.966 with double time, and 0.945 with 
unlimited time indicates that the speed factor does not seriously 
invalidate the test. In fact, it can be shown from figures 
already presented that the probable error of estimating scores 
for double time and for unlimited time from scores earned in 
single time is about 6.7 and 8.4 score points, respectively.! 

2. Increasing the time allowances does not permit dull subjects 
to equal the scores of the more intelligent subjects. In fact 
the mean of the low group for unlimited time was still well 
below the mean of the high group for single time (Figures 1-3). 
Whether the differences between high and low groups are 
decreased or augmented by increasing the time limits cannot 
be definitely answered from the present data because the scores 
of the high group were too near the maximum possible to 
allow equal opportunity to both groups. The Army figures 
seem to indicate that, in terms of absolute scores, the brighter 
subjects improve somewhat more than do the dull. 

3. When total scores are considered, there was no increase in the 
amount of overlapping of the high and low groups when the 
time allowances were increased. 


1 Computed from the usual formula: re 
PE (ent) = 0.674501~/1 — ris? 
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4. Rather marked differences in susceptibility to the influence 


of the speed factor were evident in the different tests, certain 
ones being much more open to this objection than others. 
This fact suggests a new criterion for the validation of tests, 
scales, and other types of measurements. The only test which 
was invalidated by the increased time limits was Test 3 (Prac- 
tical Judgment). Test 1 (Oral Directions), is also not very 
satisfactory. Test 2 (Arithmetical Problems) and Test 4 
(Synonym-Antonym), became even more discriminating with 
added time. 

The present results substantiate the important findings of the 
earlier Army investigation when proper allowance is made for 
the fact that Alpha is far too easy for the good students of the 


college group. The correlations reported by the two investiga- 
tions are in striking agreement. 
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SOCIAL RATING OF BEST AND POOREST HIGH 
SCHOOL STUDENTS 
PAUL V. SANGREN 
Department of Education 
Western State Normal School, Kalamazoo, Michigan 


It is apparent that a fairly high correlation exists between scholar- 
ship and scores on intelligence tests. It would be interesting to know 
(1) whether there are qualities in addition to intelligence which affect 
scholarship, and (2) whether the best students do not also possess 
greater possibilities of success in the practical world. Upon these two 
points the writer, while superintendent of schools at Zeeland, Michigan, 
had opportunity to collect some data. 

Zeeland high school is small, the enrollment being 165 students. 
This, together with the fact that many of them had been in the 
system one or more years, made it possible for the 9 high school 
teachers to become well acquainted with the students. Teachers 
could, therefore, be counted upon to pass fairly intelligent judgment 
upon the qualities and traits possessed by individual students. 

All high school teachers constructed independently a rating scale 
using the following eight abilities or qualities: methods of work, 
application, industry and attitude toward work, ability to assimilate 
new ideas, physical vigor, social and personal qualities, leadership, and 
team-work.. There were 5 degrees for each of these qualities with 
numerical ratings to correspond, these degrees and ratings being 
divided as follows: best student, 38; better than average, 30; 
average, 22; poorer than average, 14; poorest student 6. The 
scale was, therefore, modelled after Form B of Rugg’s Rating Scale 
for Judging High School and College Students. To construct the 
scale the teachers filled the blank spaces with the names of high school 
students who were thought to possess the qualities in the various 
degrees. Explanations concerning the meaning of each quality or 
ability were given the teachers in printed instructions. 

Having completed the construction of their scales, the teachers 
rated 24 high school students by direct comparison with their own 
scales. Of the students rated, 12 were students who made high scores 
on Terman’s Group Test of Mental Ability and averaged over 90 per 
cent in all of their school work for the months of September, October, 
and November, and 12 were students who made low scores on the 
Terman test and averaged less than 83 per cent in all of their school 
209 
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work for the three school months mentioned. Of each of the two 
groups of 12 students, 3 were selected from each of the four high 
school grades. Teachers were instructed to pass judgment upon no 
student whom they did not know well and to rate numerically. They 
were not told how the students were selected. 

When the ratings were completed, it was found that no student 
was rated by less than 4 teachers nor more than 7 teachers. While 
this is not a large number of ratings for a single student, it is equal to, 
if not greater than, the number of ratings made as a rule in judging 
the efficiency of a teacher. For simplicity we will call the group of 
best students the “best group” and the group of poorest students the 
‘“‘noorest group.”’ We will also call the qualities of methods of work, 
application, industry and attitude toward work, and ability to assimi- 
late new ideas the “scholarship qualities’’ and the qualities of physical 


vigor, social and personal qualities, leadership, and team-work the 
‘citizenship qualities. ”’ 


TABLE I.—TEACHERS’ AVERAGE RatTINGs OF “Best”? AND ‘Poorest’ GROUPS 
IN SCHOLARSHIP QUALITIES 





| Group 








Scholarship quality | 
| Best Poorest 
NE ay on LES Ce a ale CBR CU Robe Neb ead | 32.1 15.1 
ee, Sulindac deh «actecwee le. ieee | 33. 17.6 
RS Pe ee ar 33.1 16.7 
Assimilation of new ideas....................2008- 32.0 12.7 
EE + hrs siya 6005 Vo ania ae eed eae eee hae 32.5 15.5 











When the average ratings of each individual teacher were con- 
sidered, it was found that they consistently rated the ‘‘ best group”’ as 
“better than average’’ in the scholarship qualities. There were 
one or two exceptions among the teachers, but none of them rated 
below average in these qualities. Furthermore, the teachers almost 
consistently rated the students of the ‘‘poorest group”’ as “poorer 
than average’’ in these same qualities. This will appear upon examin- 
ation of Table I, which merely presents the average ratings in scholar- 
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ship qualities for both groups. Tables covering the average ratings 
received by each individual student will not be included here. Figure 
A, showing in graphic form the ratings of Grade IX students of both 
the groups in intelligence, scholarship, and scholarship qualities, will 
give a clear idea of how teachers judged individual students. A study 
of this figure will make it evident that students above average in 
intelligence and in scholarship are rated above the average in scholar- 
ship qualities, while students below average in intelligence and 
scholarship are rated below average in scholarship qualities. The 
same facts would be evident if individual ratings of all students of the 
remaining three grades were included. 
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Figure A. — Comparative rating of three best and three poorest high school 
freshmen. Broken lines represent poorest students; unbroken lines represent 
dest students 





Table II shows the average ratings of both groups of students in 
the “‘citizenship qualities.’”” The teachers were fairly consistent in 
rating the best group of students somewhat above average in these 
qualities and in rating the poorest group ‘‘poorer than average.” 
This will be seen when the average ratings of the groups are compared 
as in Table II. Figure B will show graphically the ratings of indi- 
vidual students of Grade XI involving the two groups. Here the 
comparisons are made between ratings in intelligence, scholarship, 
and citizenship qualities. As a rule, students who are rated above 
average in intelligence and scholarship are rated above average in the 
citizenship qualities, while students below average in intelligence 
and scholarship are rated below average in citizenship qualities. If 
comparisons were made for individuals of the other three grades, the 
results would be similar. It seems that there is a tendency, however, 
for a slighter discrimination between best and poorest students in 
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citizenship qualities as we pass up through the years in high school 
and reach Grade XII. Shall we say that this is due to elimination, 
or shall we say that it is due to training in high school? 


TABLE II.—TeraAcHERS’ AVERAGE RatinGs oF “Best” AND ‘‘PoorEst’’ GROUPS 
IN CITIZENSHIP QUALITIES 














Group 
Citizenship quality vere 
Best | Poorest 
I ee ai es | 24.5 16.6 
Social and personal qualities......................) 27.2 | 14.4 
an koa 5 Us 6 < v-sd-o oes ebte b hce es a 26.2 11.1 
NG ae aida Cigges VSS < Gao oo nde via.e © wie a'edee grata 26.0 | 13.6 
AEST PERE year Oe ey ener See 26.0 13.9 





It is also interesting to note also that the slightest difference which 
occurs between members of the best group and members of the poorest 
group in citizenship qualities is in the quality of physical vigor. But 
in spite of this fact the best students have the greater physical vigor. 
Correlations between intelligence and total ratings in scholarship and 
citizenship qualities and between scholarship and scholarship and 


citizenship qualities were calculated by the Spearman Rank Order 
Method. The results were as follows: 


Intelligence and scholarship qualities....................... 0.85 

Scholarship and scholarship qualities....................... 0.93 

Intelligence and citizenship qualities....................... 0.77 

Scholarship and citizenship qualities....................... 0.92 
CONCLUSIONS 


When scores on the Terman Group Test of Mental Ability are taken 
as measures of intelligence and average marks in all subjects for 
September, October, and November are taken as measures of scholar- 
ship, the teachers make the following judgments concerning “‘scholar- 


ship” and ‘‘citizenship”’ qualities possessed by the poorest and best 
high school students at Zeeland: 
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1. The twelve students of the four high school grades who are 


rated high in intelligence and scholarship are rated “better than 
average’’ in industry and attitude toward work, methods of 
work, application, and ability to assimilate new ideas. The 
twelve students of the four high school grades who are rated 
low in intelligence and scholarship are rated “poorer than 
average”’ in these same qualities. 
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Figure B.- Comparative rating of three best and three poorest high school 
juniors. Broken lines represent poorest students; unbroken lines represent 


best students. 


. When average ratings of individual students are considered, 


those who are above average in intelligence and scholarship are 
rated above average in the qualities mentioned in Conclusion 1, 
while students individually rated below average in intelligence 
and scholarship are rated below average in these qualities. 


. The correlation between intelligence and the total rating in 


methods of work, industry and attitude toward work, appli- 
cation, and ability to assimilate new ideas is high, being 0.85. 
The correlation between scholarship and these qualities is very 
high, being 0.93. 


. The 12 students who are rated high in intelligence and scholar- 


ship are rated somewhat above average in the qualities of 
physical vigor, social and personal qualities, leadership, and 
team-work, while those who are rated low in intelligence and 
scholarship are rated “poorer than average”’ in these qualities. 
This holds true in the individual rating and comparison of 
students as well. 


. The correlation between intelligence and the total rating in 


physical vigor, social and personal qualities, leadership, and 
team-work is fairly high, being 0.77. The correlation between 
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scholarship and the qualities just mentioned is very high, 
being 0.92. 

Thus it would appear that (1) scholarship of high school students 
is determined by the student’s methods of work, application, industry 
and attitude toward work, and ability to assimilate new ideas as much 
as by intelligence; (2) that the best students possess in a greater degree 
the qualities which will make for success in the practical world, and 
that (3) although the brilliant student is often heralded as a “freak,”’ 


it would be safer to gamble upon his success than upon the success of a 
poorer, less intelligent student. 
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A PRELIMINARY STUDY OF THE PROBLEMS IN 
THE TRAINING OF THE NON-PREFERRED 
HAND 


DORA KEEN MOHLMAN 


Bureau of Educational Research, University of Illinois, Urbana, Illinois 


The purpose of this study is (1) to designate certain points of 
significance in the training of the non-preferred hand; (2) to make some 
suggestions for the technique to be employed in this training; (3) to 
suggest problems for experimentation in this field; (4) to present a 
rather comprehensive bibliography of the literature dealing with the 
various problems of handedness. 

In order to acquaint the reader with the technical terminology of 
the present study as well as of other studies in the field of handedness, 
the meaning of certain widely used terms are stated briefly in the 
following paragraph. 

The general fact of uneven-handedness is denoted by the term 
dextrality. The hand which is inferior to the other in dexterity and 
strength, and which in consequence is less frequently used is spoken 
of as the non-preferred hand. A person who is naturally right-handed 
is spoken of as a deztral; a person who is naturally left-handed as a 
sinistral. An individual is designated as a deztro-sinistral if he is 
left-handed but has learned to write with his right hand. This term 
may be also used to designate all left-handed persons whose right 
hand has been trained to carry on any activity, other than that of 
handwriting, natural to the left hand. Either of the terms ambi- 
dexterity or ambidextrality may be used to describe the condition 
wherein neither hand is preferred over the other and where each 
hand can be used alternately to perform the same task. 


I. SIGNIFICANT POINTS 


Value of Ambidextrality—The question as to whether all children 
should be trained to make an equal use of both hands has been given 
some attention. In certain of the discussions a purely practical appeal 
for ambidextrality is made. This appeal is based on the increase in 
efficiency which would thus result in the performance of the movements 
required by trades or professions and by the necessary acts of daily 
life, and on the benefit which would accrue in case of the loss of the 
naturally preferred hand. Most of the authorities, as well as the mass 
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of laymen, however, are still of the opinion that ambidextrality has 
little practical value. 

Certain investigators among whom are Kipiani (86) and Mac- 
naughton (104) advocate the symmetrical education of both sides of 
the body as a means of preventing aphasia, St. Vitus’s dance, tic, and 
various disorders of the nervous system. The medicinal value of 
this education lies in the fact that both sides of the brain will be made 
to function and its latent force brought into play. They recommend 
also teaching patients who are suffering from war aphasia to write and 
draw with the left hand in order to develop the functions which have 
a seat in the right cerebral hemisphere. The work of these authors, 
however, does not seem to have been conclusive, for the training for 
ambidextrality as a valuable means of prevention or cure of nervous 
disorders has not received general recognition. 

Speech Disturbance.—The opinion that teaching the left-handed 
child to use his right hand causes disturbances in the mechanism of 
speech has been widely discussed and seems to have been rather 
widely accepted. This opinion is largely based on the results of 
Ballard’s (14) investigations of London school children in which 
figures are given which purport to show that training the left-handed 
child to write with his right hand produces speech defects. Wallin 
(159) in his ‘Report on Speech Defectives in the St. Louis Public 
Schools,” however, advances evidence which “corroborates only 
mildly, if indeed at all, Ballard’s conclusions.’”’ On seeing the pre- 
liminary report of Wallin’s investigation, Dr. Ballard writes as follows: 
“T regard my investigation as more or less preliminary or suggestive 
rather than conclusive.’ Many, perhaps, would find a greater 
degree of corroboration between Ballard’s and Wallin’s conclusions 
than Wallin himself does, yet it is evident that the results so far 
obtained have not definitely settled the question of the relation of 
left-handedness and speech defect. 

War and Industrial Cripples—The people of the United States, 
up to a very recent date, have been indifferent toward the industrial 
cripples of the country. It has been little known or realized that each 
year’s toll of men permanently disabled through industrial accidents 
approximates 70,000 or 80,000. Of these approximately 6000 


1Rubinow, I. M.: A Statistical Consideration of the Number of Men Crippled 


in War and Disabled in Industry. Publications of Red Cross Institute for Crippled 
and Disabled Men, Series 1, No. 4, New York, 1918, pp. 17. 
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suffer the entire or partial loss of an arm. The vital problem of the 
War’s crippled is turning our attention, however, to the perhaps even 
more vital problem of industry’s crippled. It is very probable that 
in the near future we shall have institutions for the disabled in indus- 
try similar to the hospitals and schools for the rehabilitation, re- 
education, and vocational training of the disabled soldiers and sailors. 

In conditions such as these a study of the problems connected 
with the use of the non-preferred hand becomes more deeply mean- 
ingful than ever before. 

Methods of Diagnosing Handedness.—A study by Arthur L. Beeley 
(18) contains an excellent discussion of the problems connected with 
diagnosing the native handedness of children together with tests 
which many would agree “will render diagnosis of handedness more 
accurate than is possible by any other existing test or method.” 

In an endeavor to answer at least partially the question as to what 
can be done to bring about a more adequate adjustment of the left- 
handed child to his right-handed environment, Beeley sets as his 
task (1) the derivation of a test or tests for diagnosing the native 
handedness of children; (2) a discussion of the relation between left- 
handedness and “mirror writing.” 

He evaluated certain existing tests for diagnosing handedness 
and stated their limitations. The tests discussed were the strength 
of grip, the tapping, the tracing, and the steadiness tests described 
by Whipple (164); and the Brachiometer test used by Jones (73) 
according to his theory that the right- or left-handed child is born 
with larger bones on his right or left side respectively. 

As a method of procedure in the derivation of his test, Beeley 
chose certain existing and suggestive tests and correlated their results 
in diagnosing handedness with the actual facts of handedness in a 
group of children from the Grades III, IV, V, and VI. The hand the 
child used most frequently was determined by the child’s statement 
corroborated by the teacher. It was assumed that a test involving 
dexterity would be superior to a test of skill or endurance. 

The tracing, tapping, and steadiness tests described by Whipple 
were studied. In the tracing test B, a specially designed apparatus 
was used whereby contacts were broken automatically, thus elimi- 
nating the objection to the old tracing test that a constant contact 
or one of long continuation is registered as one contact only. 

Among the conclusions which were reached, the following seem the 
most significant to the present study: 
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1. The finger-tapping test is superior to the wrist-tapping test 
because: (a) Its diagnoses correlated more perfectly with the known 
facts; (b) it revealed a greater difference between the two sides of the 
body. By reason of similar results, the wrist-tapping rest was judged 
to be superior to the arm-tapping test. 

2. There existed a perfect positive correlation between the finger- 
tapping test and the known facts. 

3. The tracing test B evinced all characteristics of a valid test 
for diagnosing handedness because: (a) Its results correlated perfectly 
with the known facts; (6) the lower the grade, the increasingly greater 
difference it revealed between the dexterity of the two hands. 

Mirror Writing.—In Beeley’s experimentation concerning the rela- 
tion between left-handedness and “mirror writing,” 42 out of 106,365 
school children were found to be “‘ mirror writers,’’ that is, one out of 2500. 
There appeared to be a perfect positive correlation between “mirror 
writing” and left-handedness. Approximately only one per cent of the 
left-handed children, however, were “mirror writers.’”’ Baldwin’s 
(9) theory was advanced in this study as the best explanation of the 
cause of ‘mirror writing.’”’ He claims that “mirror writing” in chil- 
dren is probably due to the incomplete association of the series of hand- 
movement sensations with the control series of visual sensations. 
Beeley’s investigations resulted in the conclusion that “ mirror writing”’ 
did not necessarily have a positive correlation with mental deficiency 
and certainly not with visual defect. He suggested as a method of 
correction that the pupils should write from a copy, not from memory, 
making first the incorrect, then the correct form of the letters with the 
left hand. : 

According to Mile. Joteyko (76), all beginners at left-hand writing 
have a tendency to “mirror writing.’”’ Other authors (1,94) accept this 
theory as very probable, and believe that the ease or difficulty with 
which the natural tendency to “mirror writing” is overcome depends 
on the strength or weakness of the visual factor in imagery, and that 


the subject who makes little use of visual imagery is more likely to 
fall into “‘mirror writing.” 


II. SUGGESTIONS FOR TRAINING THE NON-PREFERRED HAND OR ARM 


Mile. Joteyko (79) makes some rather general suggestions as to 
methods for training the non-preferred hand and arm in cripples. 
These suggestions are based on her interpretation of certain physiolog- 
ical principles which in her opinion are involved. She stated that 
every attempt should be made to enable the cripple to resume his 


fo 
fo 
Ww 











Training of the Non-preferred Hand 219 


former occupation. The left hand will thus perform the movements 
formerly carried out by the right hand. In this way the apprentice 
will reap advantage from the fact that one hand learns more quickly 
that which has already been acquired by the other hand. She ad- 
vances the theory that in consequence of the bilateral symmetry of 
the body, “‘mirror writing” or “mirror movement”’ is natural to the 
left side. Therefore in training the left hand for either an old or a new 
trade it should be forced to carry out the movements of the right hand 
in inverse direction. The larger movements should be perfected first, 
and the finer adjustments later. It is also stated that any effort put 
forth by the left hand causes a greater fatigue to the heart than the 


same effort put forth by the right hand and that this fact must be © 


taken into consideration in the training of the left hand. A change 
of trade should be recommended when a physician’s examination 
shows too great a strain on this vital organ. ' 

Burnette (23) gives a brief but rather suggestive description of a 
method in use at the Whitby Military Hospital. The apprentice is 
taught to trace on frosted slates; first, simple geometric designs, 
sketches, pictures; then letters, and finally, whole words. The 
tracing automatically corrects the back-slope habit which arises 
because the index finger obscures the pupil’s work. After a fair con- 
trol over the left hand has been achieved, the apprentice’s work is 
varied by having him trace a design or word and then copy it on a slate. 
The next step is to copy on paper instead of on the slate. Individuals 
are trained to write both legibly and rapidly within from 5 to 10 days 
by this method. 

In this hospital modeling in plastercine has also been found to be 
an excellent means of producing deftness of movement with the left 
hand. Here, too, a left-handed man who had to be educated in the use 
of his right hand learned direction by playing ping pong. 

Superintendent John G. Kerr (85) of the Pilkington Orthopaedic 
Hospital, Lancashire, trains certain of the pupils who are learning to 
write with the left hand, to write backwards, that is, the last of the 
letter, the last of the word, and the last of the line, first. Since only 
a few have been found, however, who are very successful in acquiring 
this back-hand writing; “‘mirror writing’’ is more frequently taught 
here. The writing is done on a sheet of paper placed on a piece of 
carbon paper. On the reverse side of the sheet may be read what has 
been written. “Mirror writing” is favored because: it is learned more 
readily than backhand writing; it is more satisfactory in respect to 
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uniformity, legibility, and speed; and it is astonishingly alike in all 
respects to the writing which was done by the right hand. Superin- 
tendent Kerr offers no explanation of the reason why it is necessary 
for the left hand to write in an inverse direction to that taken by the 
right hand. 

A six weeks course (107) given by various schools in Germany 
to fit the man with only a non-preferred arm to take up the training 
for a trade, beside others who are less seriously handicapped, includes 
instruction in the ordinary acts of life such as eating, dressing, tying 
knots, using simple tools, and writing. It is felt that at this point a 
great part of the teacher’s duty is to convince the men that all these 
things are possible and need only practice to be learned. In addition, 
practice in drawing, designing, and modeling in clay, with the left 
hand is often added as a means of functional re-education. 

In the description of this course it is stated that certain German 
teachers have made a scientific study of the question of left-hand 
writing, and that several text-books have been written on the subject. 
They are, however, unavailable to the present writer. 

Jules Amar (5) has made an exhaustive study of the problems of 
physical rehabilitation and functional re-education, and is considered 
one of the greatest present day authorities in this field. In a discus- 
sion of the means for “la formation des gauchers’”’ he points out the 
following helpful exercises and says that their continued repetition at 
an increasing rate of speed guarantees an excellent training of the left 
hand within 5 or 6 weeks. 

With his left hand the patient should practice making blows with 
a hammer of one kilogram weight until he is able, regardless of the 
height to which the hammer is raised, to hit with a sharp, quick blow 
a small piece of crayon placed on an anvil. A record of the patient’s 
progress showing the increase in rate of speed and amplitude of stroke 
may be obtained by means of a device described by the author which is 
controlled by a pulley attached to the hammer with a small cord. The 
pupil should also practice until he has become skilful in tracing with 
his left hand an oval of 20 to 25 millimeters diameter and a square of 
25 millimeters cut out in a sheet of copper. This copper sheet should 
be placed on a paper and the edges of the openings followed with a pen 
or stylus. 

The most practical and the most carefully worked out system of 
training the maimed man to write with his left hand, however, which 
has yet reached us is the one of Albert Charleux (27), a young French 
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tutor. After suffering at the age of 14 the amputation of his entire 
right arm he learned to write by the method he describes. Since the 
beginning of the war he has devoted its use to the instruction of the 
disabled soldier. 

He states that the left arm should not be expected to perform at 
first those small movements needed in writing. The first suppling 
of the arm and the first training in making the finer muscular adjust- 
ments should be obtained by practicing certain exercises—a series of 
straight, broken and circular lines—on the blackboard. The indicated 
direction of each line must be held. The number of lines and applica- 
tions in each exercise is not absolute. The apprentice can add more 
lines and devise more applications if he desires. At first the entire 
arm movement should be used. The movement should then be 
gradually restrained until the wrist movement is obtained. The 
exercises should be practiced in the order given, from the least to the 
most difficult, until the patient can easily, and to his own satisfaction, 
perform the entire series with the wrist movement. 

Writing on the blackboard is taken up next. Here also the 
apprentice restrains the arm movement with which he begins until a 
movement of wrist and finger is obtained. The same copy is repro- 
duced in letters of four different sizes. The heights of the four sizes 
are: 12-15 cm., 6-7 em., 3-4cm., and1-2cem. The work is continued 
until a legible and a regular copy of the smallest model is secured with a 
wrist and finger movement. 

The apprentice is now ready to begin writing on paper. Some 
directions are given him as to the proper positions of his body and of 
the paper, the correct size of the pen, and the best method of holding 
the paper. A person with his right arm off above the elbow and no 
artificial arm has no natural means of support for his right side. He 
should be very careful to maintain his body in an upright position 
squarely in front of the desk. Leaning to the right side causes a low 
sloping shoulder and a posture which gives fatigue to the eyes. Since 
in writing with the right hand the paper is slightly inclined toward the 
left, in left-hand writing it should be inclined toward the right. A pen 
of medium size is best. The ordinary pen point is adequate as the 
beaks are symmetrical. The pen should point over the left shoulder 
and should be held loosely and easily by the first three fingers. The 
paper can be most easily held in place by means of a metal paper 
weight. It is advisable to have a handle placed near the top of the 
weight to facilitate its shifting. 
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The names by which different parts of the letters are designated 
in the French system of handwriting are also given. Some suggestions 
are set forth concerning the best angle of slant, the correct proportion 
between the different parts of the letters, the order in which the letters 
should be learned and their size in the four types of handwriting given, 
the Cursive, the Ronde—round, the Batarde—running, and the 
Gothique. Three different sizes of letters are given in each of the four 
types and it would seem that the procedure of going from large to 
small is to be carried out in writing on paper as it is elsewhere, although 
no statement is made to this effect. The kind of movement which 
should be used is not definitely given but it is practically certain 
because of the emphasis placed in earlier exercises on the wrist and 
finger movement that this movement is to be finally attained. 

In most of the large number of recent and current publications 
which describe the work of various schools in training the one-armed 
men, either the results are described without any indication of the 
means by which they were obtained or no method is mentioned other 
than that of ‘go at the job and practice until you can do it.”’ This 
is true of writing as well as of other tasks. All specific suggestions 
and methods for the education of the non-preferred hand and arm 
which were available have been described in the present study. Some 
excellent sources for this material, however, are at present unavailable. 

The following conclusions summarize briefly the most important 
points brought out in the preceding discussion of the methods for 
training the non-preferred hand or arm: 

1. In training the non-preferred hand or arm to perform any set of 
movements, the larger adjustments should come first and the smaller, 
or finer, last. 

2. Practice in performing the thing to be learned is clearly the 
method for the training of the left hand and arm which is most in use 
at present. This method is without doubt very valuable in regard 
both to its practicality and to its efficiency in achieving results. 


III. SuGGEsteEp PROBLEMS FOR EXPERIMENTATION 


1. The statements made by Joteyko, Schuyten, Kipiani, and 
Macnaughton suggest the need for additional evidence as to whether 
the training of the left hand or arm to perform hitherto unaccustomed 
movements contributes to the cure of patients suffering from aphasia 
and other nervous disorders. 
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2. There has been no attempt, to the knowledge of the present 
writer, to determine whether one hand is aided, judged from speed 
and accuracy, in learning to perform a given task by the fact that the 
other hand is skilled in that performance. Dataonthis question would 
doubtless prove of great value. 

3. Conclusive evidence as to whether an amount of work done by 
the left hand causes more strain on the heart than the same amount 
of work carried out by the right hand would be valuable. Unusual 
difficulties in framing a suitable plan of experimentation, however, 
would likely present themselves. 

4. The problem of securing a means for predicting man’s ability 
measured by speed and accuracy, in learning to use the non-preferred 
hand, while perhaps less important than some of the preceding topics, 
presents an excellent field for experimentation. 

5. Evidence additional to that given by Wallin and Ballard is 
needed to settle definitely the question concerning the relation of 
left-handedness and speech defects. 
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THE CONSISTENCY SHOWN BY INTELLIGENCE RAT- 
INGS BASED ON STANDARDIZED TESTS AND 
THE TEACHER’S ESTIMATES 


J. E. WALLACE WALLIN 
Bureau of Special Education and Department of Clinical Psychology, 


Miami University 


The writer began to give Binet Tests and a dozen group tests of 
intelligence of his own construction in the latter part of 1909. The 
use of the Binet Tests after the 1908 revision became known spread 
like wild-fire throughout the country and throughout the civilized 
world, but the use of group tests for measuring intelligence made no 
headway whatever until the United States was drawn into the World 
War. 

The rapid spread of the Binet Scale was due partly to the great 
practical utility of the scale, and partly to the extravagant claims made 
regarding it. According to the propagandists’ claims “ Binet’s plan 
was perfect’”’ for measuring “native intelligence;’”’ the scale was a 
“marvel of accuracy,” so “amazingly accurate’’ as to admit of little 
or no improvement from ages 5 to 12; the scale provided a well-nigh 
“infallible”? means of determining whether a child was feeble-minded, 
which could be effectively used even by “novices,”’ or ‘untrained or 
wrongly trained persons,’’ ‘nothing else’”’ being needed for diagnosing 
‘‘feeble-mindedness”’ in the great mass of cases than this test. After 
attempting to subcribe to such opinions as these for a year wholly 
devoted to clinical practice, the writer became thoroughly convinced 
that they could not be substantiated, and the burden of most of his 
papers and addresses during several years was to point out the nature 
of the exaggerations and of the limitations and defects of the Binet 
Seale with respect to its adequacy as an instrument for measuring 
“native intelligence,’’ with respect to the accuracy of the age place- 
ment of the individual tests and of the aggregate age standards, and 
with respect to the sufficiency of the scale for diagnosing mental 
deficiency, particularly in the hands of “amateurs” and especially by 
means of certain arbitrary standards of intelligence defect which had 
received almost universal acceptance, and which he was forced from 
personal and direct study of a great variety of cases to reject almost in 
toto. 
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After over a decade of poignant criticism by numerous able writers 
here and abroad, the Binet scale continues to be used in various 
revisions as the best available scheme of tests for determining the 
“level of intelligence” at least verbal “intelligence.’”’ But nearly all 
qualified authorities now concede that even the latest editions of the 
scale are far from perfect either in internal construction or in 
standardization. 

The history of the extension and use of the group tests of intelli- 
gence parallels closely the development and application of the Binet 
scale. 

We have been assured that “intelligence” can be just as accurately 
measured by so-called ‘‘group tests of intelligence,’”’ as the depth of a 
well or a river can be measured by a physical measuring rod, and that 
children and adults can be accurately classified with respect to intelli- 
gence by group tests. One writer states that subjects from first grade 
level to university level can be accurately rated in intelligence by means 
of his group scale alone, and that any first grade teacher can on the 
first day of school after a 20-minute examination classify her pupils 
in regard to their intellectual ability, and section them accurately 
for the purpose of instruction. We have also been told that intelli- 
gence can be measured more accurately by group tests than by indi- 
vidual tests (the Binet Scale). Usually these claims are not based on 
mere assertion, pure zpse dizits. They have often been supported 
by the finding of high correlation coefficients between the group tests 
and the Binet Tests or some other criterion of “native ability” or 
“intelligence,’’ although it must be confessed that sometimes numerous 
correlations which have been exceedingly low or quite negative have 
been ignored or conveniently forgotten. In many cases, however, the 
claims made have been based upon the irresistible tendency of all 
propagandists and test publishers to exaggerate, and of some test 
designers to advertise the superior virtues of their own wares. 

Claims of this character are partly responsible for the amazingly 
rapid introduction of group intelligence testing into the lower and 
higher schools and into the institutions for dependents or defectives 
throughout the country, and the widespread practice of classifying or 
sectioning pupils solely or partly on the basis of the scores or IQ’s 
obtained from group intelligence tests. 

But the history of the Binet movement is now repeating itself. In 
the wake of the period of exaggeration there has recently supervened a 
period of analytical examination, and trenchant criticism of the 
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assumptions, claims, values, validity, and uses of the group tests of 
intelligence. This phase of ‘reaction’? should be welcomed and 
encouraged instead of being ignored, belittled or resisted, as was done 
by some of the Binet devotees in the early days of the examination 
and critique of the Binet Scale, who seemed to regard the scale as 
something immutable and sacrosanct. Practically all the early 
criticisms of the Binet Scale have been substantiated. It is only by 
searching and unhampered criticism that we shall eventually be able 
correctly to appraise the extent of the imperfections, the inevitable 
limitations, and the legitimate uses which may be made of existing 
group tests of intelligence. 

The following study of the agreement obtained in the intelligence 
rating of the first grade pupils in the Miami University practice school 
by the Pressey Primer, the Myers Mental Measure, the Detroit First 
Grade Intelligence Test, the Stanford Binet-Simon Scale and the 
teacher’s estimate, was carried out between February 24 and April 14, 
1922. One group test was given on each Friday morning, while the 
Binet testing was done on the three following Friday mornings. All 
of the group testing was done by my assistant, Miss Mildred Rothhaar; 
while the Stanford-Binet testing was done by six students who were 
taking the Psycho-Clinic Practicum offered by the Bureau, or by Miss 
Rothhaar or by myself. The students had had experience 3 hours per 
week in giving the Stanford-Binet under critical supervision since the 
beginning of the school year. The scoring of the group tests and the 
computation of the results were done by the students under the super- 
vision of my assistant, who checked over many of the results, including 
all of the correlation coefficients which were computed independently 
by two students (on a Burrough’s Calculator). In the experience of 
the writer, correlation coefficients computed only once, cannot be 
implicitly accepted as accurate whether done by ‘student or 
psychologist. 

The first grade critic teacher who estimated the intelligence of the 
pupils was a college graduate of considerable experience in the lower 
grades who had pursued various courses on intelligence and educational 
tests. She was asked to rank each pupil in the order of intelligence, 
assigning to each an arbitrary score on the basis of 100 for the ablest 
pupil. She was asked to consider the various factors usually empha- 
sized in the attempt to estimate children’s intelligence (namely, native 
ability or wit; demonstrated capacity or alertness or initiative, as seen 
in and out of school, presence of physical disease or defects, specific 
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mental disabilities; social and educational advantages or disadvan- 
tages; age). She was not given the results of the tests until her esti- 
mates had been filed, but she observed the pupils during the testing, 
and had available the results of prior test results for a few of the pupils. 
She also ranked the pupils according to their proficiency in the school 
work, but no use is made of these estimates in this article. 

The maximum number of pupils tested by any one scale was 42, 
and by all the scales, 34. The comparisons are based upon these 34 
(20 boys and 14 girls), whose average and median ages were, respec- 
tively, 6.8 and 6.6. The modal age was 6.9. 


TREATMENT OF THE RESULTS 


1. The correlation coefficient by the Pearson unabbreviated 
product moment method (r = =) was computed between the scores 
of the different tests, and the teacher’s estimates and the test scores, 
Table I. In the case of the Binet both the absolute scores and the 
IQ’s have been used, while only the absolute or raw scores have been 
used in the group tests. Our earlier analysis of the correlat*on coeffi- 
cients between the Binet and Pressey showed sharp discrepancies 
between the r based on the IQ, and the r based on the absolute scores. ! 
Parallel discrepancies occur in this investigation, as will be seen 
presently, the smaller coefficients being derived in both investigations 
from the use of the IQ figures. It is evident that the conclusions 
drawn must be based on the raw scores, which represent the ultimate 
figures. Naturally the IQ values for the different tests are quite 
discrepant and incommensurable, as the tests are differently scaled. 
The Pressey is scaled on a maxium score of 100 points, the Detroit of 
50 points, and the Myers of about 140 points. 

2. The average difference in the rank order of the pupils, the range 
of the rank differences, and the percentage of rank differences exceeding 
10 steps, were ascertained between the absolute and relative (IQ) 
scores in the tests, and the teacher’s estimates, Table II. The figures 
representing the average differences are subject to a certain amount of 
uncorrectable error due to the existence of identical scores. 
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Initial Selection of Presumptive Mental Defectives. School and Society, 1921, 
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3. The percentage of agreements was ascertained in the placement 
of the pupils in the first (highest), second, third, and fourth quartiles 
by the different tests and teachers’ estimates, Table III. 

Here, again, the percentages are subject. to an error because of 
the existence of identical scores. There seemed to be no satisfactory 
way of eliminating these errors, hence the identical scores were 
arranged in rank order according to chance. 

4. The differences were computed in the age-rating obtained by 
the pupils in the Pressey, Myers, Detroit and Stanford-Binet, Table 
V. Tentative norms for the Detroit Tests in terms of age and months 
were sent me by Miss Anna M. Engel, based on the examination of 
5039 children tested in June, 1921. For the Pressey and Myers, 
the scores were interpolated between the whole-year norms supplied 
with the tests. Extrapolated scores were used only when the values 
were merely slightly below or above the liminal values. 

In the case of the Pressey it was felt that the interpolated values 
would be more usable than the percentiles which accompany the 
norms. A margin of error necessarily results from the use of the 
interpolated and extrapolated norms. 


STATEMENT OF EXPERIMENTAL RESULTS 


THE COEFFICIENTS OF CORRELATION 


Before analyzing the experimental data let us advert to the possible 
objection that the correlation figures obtained must necessarily be 
quite unreliable because of the limited number of cases. True. But 
the situation we present is absolutely typical. Tests are constantly 
being given throughout the country to classes of pupils as actually 
constituted, be they large or small. If the tests have any value at 
all they must be of service in the classification of small classes as well 
as large classes. Moreover, the reliability of the r’s, especially those 
based on the Binet absolute scores, may be assumed from the PE’s, 
Table I, to be quite high. The r’s based on the Binet absolute are 
from about 5 to 10 times as large as the PE’s. 

Based on the Binet IQ the r’s range from 10, with the Detroit, 
to 65, with the teacher. With the Pressey the r is 15, compared with 
0.03 found by the Ayres abbreviated method in the previous investi- 
gation to which reference has been made. 
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TABLE I.—CoEFFICIENTS OF CORRELATION (r) AND PROBABLE Error oF r (PE) 
BETWEEN THE BINET, PRESSEY, MYERS, AND Detroit TESTS, AND THE 
TEACHER’S RANKING 





| Binet | Binet | - | 
 cheliieme 1Q Pressey | Myers | Dae Teacher 





r |PE!| r |PE| r | PE! +r | PE! r | PE! r | PE 








Binet IQ...........)....)...-]-.--[-.. |. 158}. 112) .531) .083) . 105) . 114) .653) .066 
Binet absolute...... .|.455) .091) . 678) . 062) . 439) .093) . 564! .078 



































Pressey............|.455).091).158).112)....)....|.379}.099) . 508) .085) . 380) .098 
IR a Tiaat kcarimsene . 678) .062) . 531) .083) .379) .099)....|... .|.464) .090) .557| .079 
re . 439) .093) . 105) . 114) . 508) .085) .464;.090)....)....|.297}.105 
0 eee .487| .078) .653) .066) . 380) .098) . 557) .079) . 297). 105 
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The figures in the group tests represent raw or absolute scores. 





Based on the Binet absolute scores the r’s range from 0.44, with 
the Detroit again, to 0.68 with the Myers. Myers has reported a 
much higher correlation between his scale and the Stanford Binet, 
namely “about 80 within each grade from the first to the eighth.”’ 
With this group of pupils the correlation with the Pressey was only 
0.45, as compared to 0.73 which we previously obtained with a larger 
group of children who averaged older in age. 

In the case of the group tests (absolute scores), the lowest correla- 
tion is 0.38, between the Pressey and the Myers, and the highest 0.50, 
between the Pressey and the Detroit. The highest correlation for 
any single group test is between the Myers and the Binet absolute, 
0.67, and between the Myers and the teacher’s ranking, 0.55. 

Based purely upon the size of the r obtained from the teacher’s 
ranking, the Binet and the Myers rate the pupils the most accurately, 
while the Detroit is decidedly the most inaccurate, the difference 
between‘ the Binet absolute and the Detroit amounting to 0.19. Based 
purely upon the size of the r obtained from the Binet absolute scores, 
the Myers ranks highest, while the teacher’s judgment is superior 
to the Pressey and Detroit. The difference between the Myers and 
the Detroit amounts to 0.24. 

While these correlations cannot be considered to be very high, 
with possibly one or two exceptions, the analyses which follow seem to 
force us to the admission that, small as they are, they are nevertheless 
fictitiously high, possibly because of the almost consistent tendency 
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TasB_Le II.—DIFFERENCES IN RANK ORDER 
Between Rankings Based on Absolute Scores 
Per cent 
‘4 Average Range of rank of rank 
ests compared rank ; differences 

difference differences exceedin 

g 

10 steps 

| 
Pressey and Myers............... | 8.97 From 0 to 22 41.1 
Pressey and Detroit.............. 7.08 From 0.5 to 17 32.3 
Myers and Detroit............... | §.41 From 0 to 20.5 38.2 
Binet and Pressey................! 8.35 From 0 to 28 29.4 
Binet and Myersu................ 5.0 From 0 to 22 11.7 
Binet and Detroit................| 7.76 From 0 to 22 29.4 
Teacher and Pressey.............. | 8.7 From 0 to 28 41.1 
Teacher and Detroit.............. 9.64 From 1_ to 25.5 47.0 
Teacher and pa 8.05 From 0.5 to 25 | 38.2 
Teacher and Binet............... | 8.23 From 0 to21 | 38.2 
Average of three group tests and | | 
ay ee ich Caseewed «5 | 7.55 From 1 to 25.5; 20.5 
| | 
Between Rankings Based on Relative Scores (IQ’s) 

a iilipdadanpaatiiat 7 . 
Pressey and Myers............... 9.67 From 0 to 24.5 38.2 
Binet and Myers................. 6.88 From 0.5 to 22 23.5 
Binet and Pressey................ 9.53 From 1 to 26 38.2 
Binet and Detroit................ 7.17 From 0.5 to 22.5 23.5 
Pressey and Detroit.............. 8.26 From 0 to 25.5 29.4 
Myers and Detroit............... 8.03 From 0.5 to 24 32.3 














Between Rankings Based on Binet Relative (IQ) and Group Absolute Scores, or the 
Teacher’s Rating 




















Benet end Bivers...............6%-. | 7.79 From 0.5 to 22 29.4 
Binet and Premey................ 9.94 From 0.5 to 31.5 44.1 
Binet and Detroit............... 10.50 From 0 to 25 50 
Binet and teacher............... 6.79 From 0 to22 | 35.3 

| 
of the Binet to rate the pupils higher than the group tests. Thus 


29 pupils are rated higher by the Stanford than by the Myers, while 
the reverse is true for only one case. The total excess rating given 
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by the Binet for the 29 pupils amounts to 43.5 years, compared with 
an excess of 0.2 year for the one case which is rated higher by the 
Myers. (See Table V for similar data relating to the other tests.) 
When one scale grades consistently high and another scale consistently 
low, we may obtain a high correlation coefficient between them 
although there would be a pronounced age difference in the rating 
between the two scales, assuming that the differences were large. In 
other words, a high correlation coefficient does not demonstrate the 
accuracy of the ratings obtained by a comparative scale in terms of a 
standard of accepted or demonstrated accuracy. As I have previously 
remarked, ‘‘the correlation coefficient has often, in my experience, 
proved to be a clever device for concealing the true state of affairs— 
a fictitious refuge in our search for security. . . . Plus and minus 
deviations are not neutralized or concealed when individual cases 
are compared. ... Even a high correlation coefficient cannot 
demonstrate the accuracy of a particular measurement.’”! 


AGREEMENT IN RANK ORDER 


The tests are so paired in Table II, as to permit of “1 comparisons 
of the rank differences in the placement of the pupils by the different 
tests or by the teacher and the tests. The analysis will be restricted 
to few of the comparisons contained in the table. 

By examining the second column, ‘‘range of rank differences,’ 
which gives the smallest and the largest differences found in the rank 
placement of any given pupil by two tests (based on both the absolute 
and the relative scores) or by one test and the teacher’s score, it will 
be seen that some pupils were assigned exactly the same rank order, 
while the maximal differences in the ranking varied, in the different 
comparisons, from 17 steps to 31.5 steps out of 33 possible steps of 
difference. To illustrate: The maximum difference in the ranking of 
the same pupil, based on the absolute scores, amounted to 22 steps 
between Pressey and Myers, Binet and Myers, and Binet and Detroit; 
and to 28 steps between the teacher and Pressey, and 21 steps between 
the teacher and: Binet. Based on the relative scores it amounted 
to 25.5 steps between Pressey and Detroit, and 26 steps between 
Binet and Pressey. Between the Binet relative and the Pressey 
absolute scores it reached 31.5 steps. The Pressey ranked this 





1 Wallin, J. E. Wallace: The Theory of Differential Education as Applied to 


Handicapped Pupils in the Elementary Grades. Journal of Educational Research, 
1922, pp. 209-224. 
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pupil 31.5 steps higher than did the Binet. The smallest maximum 
range in any of the comparisons is found between the Pressey and 
Detroit absolute, 17 steps, followed by the teacher and Binet absolute, 
21 steps. 


The last column gives the ratio of subjects whose rank differences 


by any two tests exceeded 10 steps. The smallest percentage is for 
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the Binet and Myers (absolute scores), followed by the Binet and 
Myers and Binet and Detroit (relative scores). In the case of the 
Binet and Myers absolute, 11.7 per cent of the pupils differed by more 
than 10 steps; in the case of the Binet and Myers and Binet and Detroit 
relative scores the corresponding figure was 23.5 per cent. When 
the Binet relative and the Detroit absolute scores, and the teacher’s 
and Detroit absolute scores are compared, the rank difference exceeds 
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10 steps for 50 and 47 per cent, respectively, of all the pupils. When 
the Binet and average of the 3 group tests are compared, the difference 
in the ranking exceeds 10 steps for 20 per cent of the pupils. 

Based upon the averages in the first column, the smallest average 
difference in the ranking of all the pupils by any two tests amounts to 
5 steps, between the Binet and Myers absolute scores, while the largest 
amounts to 10.5 steps, between the Binet relative and Detroit absolute 
scores. The average difference amounts to over 7.0 steps in all except 
four of the 21 comparisons. The average rank difference between 
the teacher’s rating and the Binet IQ is 6.8 steps, and between the 
teacher’s rating and the Binet absolute 8.2 steps. Based on the Binet, 
the smallest average difference is given by the Myers and Detroit, 
the average difference in rank order based on the absolute scores 
being 5.0 and 7.76, respectively, and based on the relative scores, 
6.88 and 7.17, respectively. If the group tests are averaged the 
difference with the Binet absolute still amounts to 7.5 steps. 

The difference in the ranking of the pupils by two tests appears 
strikingly in the graph, which gives the difference in the ranking of 
each subject between the Binet and the average of the three group 
tests. 

The extent of the disagreements between the different tests is 
also shown by an analysis of the 


AGREEMENT IN QUARTILE GROUPING 


If the teacher were to section the p':pils into 4 groups according to 
test ability, she would find the greatest »mount of agreement in the 
two extreme quartiles, containing the best and the poorest pupils, 
Table III. But even so, we find that in only 2 of the 10 possible 
comparisons between the rankings of two tests, or of one test and the 
teacher’s rating, is there agreement on more than half of the pupils 
who should be assigned to the highest or the lowest quartile. Pressey 
and Detroit agree on 66.6 per cent and the teacher and Myers on 
55.5 per cent for the highest quartile; and the Binet IQ and the Myers 
on 62.5 per cent and the teacher and Binet IQ on 75 per cent, who 
should be assigned to the lowest quartile. The lowest agreement is 
between the Binet IQ and Detroit which agree on only 12.5 per cent 
of the pupils for lowest quartile. Altogether the highest agreement is 
between the teacher and the Binet IQ, 44.4 and 75 per cent for the 
first and fourth quartiles; and possibly between the teacher and Myers, 








TAl 


en --B--Me-Me-Bo- Bo oe 


ioe! 





Consistency of Intelligence Ratings 241 


TaBLE II].—AGREEMENTs FouND BETWEEN THE GRouP TESTS IN THE QUARTILE 
GROUPING OF THE CHILDREN 





























on Per cent | Percent | Per cent 
Tests first placed in placed in | placed in 
(highest) second third fourth 
quartile quartile quartile quartile 
By Pressey and Myers.......... 33.3 12.5 4 37.0 
By Pressey and Detroit......... 66.6 12.5 33.3 25.0 
By Myers and Detroit.......... 33.3 | 25.0 11.1 37.0 
By Binet and Myers............ 33.3 12.5 33.3 62.5 
By Binet and Detroit........... 44.4 12.5 22.2 12.5 
By Binet and Pressey........... 33.3 25.0 11.1 37.0 
By teacher and Binet........... 44.4 12.5 33.3 75.0 
By teacher and Myers.......... 55.5 25.0 11.1 50.0 
By teacher and Pressey......... 33.3 25.0 22.2 50.0 
By teacher and Detroit......... 44.4 25.0 11.1 25.0 
By Pressey, Myers and Detroit... 22.2 0 0 25.0 
By Pressey, Myers, Detroit and 
ier nahn sce Rao ke ee 0 0 0 | 12.5 
By Pressey, Myers, Detroit and | 
SSR one Tons _) a ae | oe | | 25.0 
By Pressey, Myers, Detroit and | | | 
I id «2s crenshanwesenditi duadies 0 0 a 5.8 
By average of three group tests | | 
GR calicks cussed codes 88 | . S85}. 788 ) 11.7 
By average of three group tests | 
NC 80k Sie RES veda eee 8.8 | 5.8 5.8 8.8 











Based on the 34 pupils who were given all the tests. Nine pupils were assigned 
to the first quartile, 8 to the second, 9 to the third, and 8 to the fourth. 


The absolute scores are used in the group tests and the relative (IQ) in the 
Binet. 


55.5 and 50 per cent, respectively. Using the Binet as the standard 
of reference, the teacher and the Myers show the greatest amount of 
agreement. 

The three group tests do not agree on more than 25 per cent of 
the cases; the three group tests and the Binet IQ on more than 12.5 
per cent, and the three group tests and the teacher on more than 
25 per cent all in the fourth quartile. The 3 group tests, the Binet 
IQ and the teacher agree on only 5.8 per cent of the cases in the fourth 
quartile; the average of the 3 group tests and the teacher on only 
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8.8 per cent in the first and 11.7 per cent in the fourth quartiles; and 
the average of the 3 group tests and the Binet IQ on only 8.8 per cent 
in the first and fourth quartiles. (The figures differ only slightly 
in these two comparisons if the Binet absolute scores are substituted.) 

Turning to the middle quartiles, we find that in only three of the 
ten comparisons between two tests or between one test and the teacher 
is there agreement on as many as one-third of the pupils who should 
be assigned to either quartile, namely between the Pressey and Detroit, 
the Binet and Myers, and the teacher and Binet, all in the third 
quartile. In about half of the comparisons the tests agree on less than 
13 per cent of the pupils. None of the tests possess any distinct 
superiority in selecting the pupils for the second and third quartiles, 
but the agreement with the Binet is greatest for Myers and the 
teacher. Neither the three group tests, nor the three group tests and 
the Binet, nor the three group tests and the teacher agree on a single 
child for the middle quartiles. The average of the three group tests 
and the Binet agree on only 5.8 per cent of the pupils, for the middle 
quartiles, and the average of the three group tests and the teacher 
on 8.8 and 58 per cent respectively, for the second and third quartiles. 

The disagreements appear equally evident when the scores are 
translated into age-ratings and critically analyzed. 


AGREEMENT IN INTELLIGENCE AGE-RATING 


When the average intelligence ages by the different tests are 
compared in Table IV, there is, indeed, fair agreement. The average 


TaBLE IV.—AVERAGE INTELLIGENCE AGE ACCORDING TO THE DIFFERENT TESTS 


BINET DetTroir PRESSEY MYERS 
i eg ek eeeses 34 34 26 30 


Nr sx 6s s-c'ee osc eancts 7.0. 6.9 6.7 6.2 


Binet intelligence age is 7.0 year, which is 0.2 years higher than the 
average and 0.4 year higher than the median chronological age. The 
Detroit gave practically the same result, while the Pressey and Myers 
ratings were 0.3 and 0.8 of a year lower. The differences between 
the Binet averages and the averages in the group tests are negligible 
except in one case. 

But the extent to which the individual intelligence age-ratings 
obtained by the tests agree or disagree cannot be inferred from the 
average intelligence ages, but only by computing the difference in 
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the age-ratings obtained for each pupil by the different tests. Table V 
gives the range of differences in decimals of a year between the age- 
ratings obtained by the subjects in the two tests compared; 7.e., the 
range from the subject who showed the smallest age difference to 
the one who showed the largest age difference in the tests compared; 
the percentage of cases who differed by 2 years or more and by 1 
year or more in the intelligence age-rating obtained in the two tests 
compared; and the number of subjects who rated higher in each of 
the tests compared, together with the total and the average amount of 
difference in the intelligence rating in terms of years. This table 
demonstrates with peculiar force how widely discrepant the verdict 
of two tests may be on the same pupils. Thus the maximum difference 
in the rating of any pupil in this group amounted to 3.1 years as 
between the Myers and Stanford, 2.8 years as between the Detroit and 
Stanford, and Pressey and Stanford, 2.5 years as between the Pressey 
and Myers, 1.9 as between the Pressey and Detroit, and 1.7 as between 
the Myers and Detroit. These figures, of course, represent the 
extreme cases. Nevertheless the table shows that the rating differs 
by 2 years or more for 24 per cent of the pupils when the Pressey 
and Myers are compared, for 15.4 per cent when the Pressey and Stan- 
ford are compared, and for 13.3 per cent when the Myers and Stanford 
are compared. The difference amounts to 1 year or more for 83.3 
per cent of the cases when the Myers and Stanford are compared, 
for 29.4 per cent when the Detroit and Stanford are compared, for 
43.3 per cent when the Pressey and Stanford are compared, and for 
23 per cent when the Pressey and Detroit are compared, which showed 
the highest agreement from the point of view of this criterion. When 
test results differ by 1 year or more on from 23 to 83.3 per cent of 
children who average only 6.8 years of age the suspicion is irresistible 
either that the norms are inaccurate, or that the tests are imperfect or 
worthless, or that the tests measure different things, or that the 
measurement of human intelligence is, fundamentally, so intricate 
and difficult that it cannot be accurately done by group tests. It 
was my conviction that the accurate measurement of intelligence by 
group tests would be far more difficult in the case of young children 
than in the case of older ones.! But this seems not to be true, judg- 
ing by the results of Stenquist based on five group tests of intelligence 
given to a class selected at random from the upper elementary grades 





1 School and Society, 1921, p. 34. 
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of a school in New York City,' and by the results of four intelligence 
tests given to the sixth, seventh and eighth grades in the Miami 
University practice school. The difference between the ingelligence 
age-rating obtained in a single test compared with the average of 
all five tests varied from 2.08 years to 2.58 years for over 18 per cent 
of the New York pupils. The corresponding differences between 
the rating obtained from the individual scales, having regard for the 
plus and minus deviations, varied from 3.16 years to 4.08 years. 
The median difference in the intelligence age-rating of the upper grade 
pupils in the Miami practice school obtained by the Stanford-Binet 
and the Illinois Examination (the only scales with directly comparable 
norms) amounted to 11 months, varying from no difference to 3.8 
years. The differences ranged from 13 months to 3.8 years in 43.8 per 
cent of the cases and from 2 years to 3.8 years in 17 per cent.! 

The results obtained by Stenquist, Guiler, and myself from experi- 
mental investigations of the amount of disagreement found between 
different tests are quite disappointing and must seriously disturb the 
confident belief, so generally accepted at present, that group tests give 
an accurate measurement of “general intelligence’? and a highly 
reliable and accurate means of sectioning pupils according to their 
ability for the purpose of instruction. If this belief is justified the 
conclusion seems unavoidable that the tests used by the above three 
writers (and the tests used by other workers which show large discrep- 
ancies) measure qualities which are so different as to be practically 
incommensurable. 

At any rate, so long as different tests which are assumed to measure 
the same thing (e.g., ‘‘general intelligence,”’ or ‘‘ verbal intelligence, ’’ 
or ‘non-verbal intelligence,”’ or ‘native ability,’’ or “alertness’’) 
give glaring discrepancies in the rating of any considerable number of 
subjects, we cannot evade the issues: Is it possible to obtain accurate, 
reliable measures of mental traits by means of group tests? If so, is it 
possible to obtain such a comprehensive measure of “general intelli- 
gence’’ by a brief group test based on the tests of a very limited number 


1 Stenquist, John L.: Unreliability of Individual Scores in Mental Measure- 
ments. Journal of Educational Research, 1921, pp. 347-354. 

2 These figures are based on a table supplied me by Guiler, who has since 
published an analysis of the disagreements of the tests based on the 1Q’s: Guiler, 
Walter S.: How Different Mental Tests Agree in Rating Children. The Elemen- 
tary School Journal, 1922, pp. 734-744. It is unfortunate that the writer did not 
analyze the results for the absolute scores also. 











of 
qui 
suc 


“é g 
the 
th’ 
dit 
in’ 
in 
te 





Consistency of Intelligence Ratings 245 


of traits arranged in an artificial setting, as will accurately and ade- 
quately reflect intelligence as it manifests itself in varying degrees of 
successful adjustment to the work of the school, mart, factory, farm, 
office or playground? If so, do the existing tests alleged to measure 
“general intelligence” (or “verbal” or “‘non-verbal”’ intelligence, as 
the case may be) give us accurate measures of intelligence viewed in 
this broad, comprehensive way? Or do they give us measures of 
different mental qualities or traits and of very limited aspects of 
intelligence as a whole? The solution of questions such as these is, 
in my judgment, now more important than the devising of new 
tests or the wholesale application of tests. 
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THE CONSTANCY OF INTELLIGENCE QUOTIENTS 
WITH BORDERLINE AND PROBLEM CASES 


V. A. C. HENMON, 
University of Wisconsin 
and 
HELEN M. BURNS 
Supervisor of Special Classes, Madison, Wisconsin 


The constancy of intelligence quotients is a matter of such practical 
moment that significant data should be promptly reported. This 
is particularly true of borderline cases and such others as, for one 
reason or another, arise in a school system and are referred to the 
supervisor of especial classes or other examining agency for diagnosis. 
The cases whose intelligence quotients lie between 60 and 80 or between 
65 and 75, present a peculiarly troublesome problem to the diag- 
nostician who is expected to make disposition of them or make recom- 
mendations on a basis of which proper disposition can be made. The 
reliability and constancy of the intelligence quotient in these cases 
is of very special importance. For all we know the validity, reliability 
and constancy of tests may be greater or less with this group than with 
those very definitely defective or definitely superior. Few reports of 
those that have recently appeared, deal specifically with this group.’ 

This study deals with 72 pupils who have been referred to one of the 
writers, Miss Burns, Supervisor of Special Classes in the Madison 
Schools, and who have been retested one or more times. All of the 
examinations have been made either by Miss Burns or by Dr. Elizabeth 
L. Woods, State Supervisor of Special Classes. Both are trained and 
experienced examiners. In 59 of the cases, both examinations were 
made with the Stanford Revision. In 18 cases, the first test was 
made with the Goddard Revision of 1911 and the second with the 
Standard Revision. The distribution of the 77 cases by intelligence 
quotients is as follows: 





1 See summary of 8 studies by Rugg and Colloton, Constancy of the Stanford- 
Binet IQ as shown by retests. Journal of Educational Psychology, Vol. XII, 
No. 6, September, 1921; and subsequent articles in the same journal by Wallin, 
Vol. XII, No. 8, October, 1921; by Stenquist, Vol. XIII, No. 1, January, 1922; 
and by Gordon, Vol. XIII, No. 6, September, 1922. 
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TABLE I 
IV Stanford Stanford Goddard Stanford 
first test second test first test second test 

ne " | 
40— 50 1 2 1 1 
50—- 60 6 4 0 1 
60— 70 14 17 3 6 
70— 80 14 15 5 2 
80— 90 11 10 3 4 
90-100 7 6 2 2 
100-110 6 5 1 2 
110-120 0 0 3 0 
Median IQ.... 76 73 7 73 

| | 











This indicates the nature of the cases studied, a large share of 
them having intelligence quotients between 0.60 and 0.90. Some of 
them are above 100 but are included since they were referred for 


examination. 


The constancy of the intelligence quotients may be shown in the 
usual ways, (1) by the correlations between the first and second 
tests, (2) by the average or median difference between the test and 


TABLE IJ.—CorRRELATION TABLE FOR 59 STANFORD REVISION RETESTS 
(1Q, Second Test) 





1Q 75- 79 
first 70— 74 
test | 65-— 69 
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retest, and (3) by the limits of difference which would include 50 per 
cent of the cases. 

The correlation table and coefficient of correlation for the 59 
cases tested both times by the Stanford Revision appear in Table II. 

The coefficient of 0.91 agrees well enough with those previously 
reported. Terman with 428 cases found a coefficient of 0.93, Cuneo 
and Terman for three groups of 25, 21, and 31 cases found coefficients 
of 0.95, 0.94 and 0.85 respectively; Rugg and Colloton for 137 cases 
report a coefficient of 0.84; Gordon for 44 cases reports a coefficient 
of 0.84; while Stenquist for 274 cases found a coefficient of 0.72. In 
view of the considerable differences in the heterogeneity of the groups 
and hence the differences in variability, these coefficients are not com- 
parable. The determination of the average differences gives a better 
basis of comparison. 

The average differencef or the 59 cases is 5.3 points IQ. For 
the 18 cases given the Goddard Revision at the first test and the Stan- 
ford Revision at the second test, the average difference is 9.0 points 
IQ. All but 3 of the 18 cases in the second group show a loss. In 
Rugg and Colloton’s summary table there are two reports in which 
the Goddard revision was used in the first test and the Stanford 
Revision in the second. Garrison with 62 cases found an average 
difference of 4.66 points 1Q, while Wallin for 120 cases found a differ- 
ence of 10.2 points 1Q. Obviously our results are in much closer 
agreement with those of Wallin. 

The average difference where the Stanford Revision was used in 
both tests reported by Terman as 4.5 PE, by Garrison as 4.66, by 
Poull as 4.6, by Rugg and Colloton as 4.7, are somewhat less than 
the 5.3 points IQ which we have found. Our results in turn show 
greater constancy than those of Gordon 6.8, Wallin 6.1, and Terman 
and Stenquist 7.5. 

By the formula PEy = .6745¢ +~/1 — ry, with a correlation of 
0.91 and a standard deviation of 15.6 the probable error of measure- 
ment is 3.15 points. This agrees with the determinations of Otis, 
Terman and Rugg who have found that the PE in terms of IQ is about 
3 points. 

The range of difference for the Stanford Revision cases runs 
from — 13 points to + 15 points. The limits within which 50 per cent 
of the cases lie fall at — 6 points and + 3 points. In this respect, our 
results are very different from others. Rugg and Colloton state 
that ‘For all studies the positive differences are nearly twice as 
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large as the negative differences.” In our results this is not the case. 
Out of the 59 cases, 31 show a loss in the retest, the average loss being 
6.3 points, while 24 cases show an average gain of 5.9 points with 
cases yielding an identical score. The median change is a loss of 
1.75 points IQ and the average difference 4.8 points. The typical 
result to be expected with such a group then is a loss rather than a 
gain. This appears more markedly in the 28 cases whose first tests 
gave IQ’s between 60 and 80. Twenty of the 28 cases show an average 
loss of 5.2 points, while 8 show an average gain of 3.7 points. The 
median change with this group is a loss of 3.7 points. It is well 
established that with feebleminded subjects the I1Q’s tend to decrease. 
This is the evident tendency with borderline cases also and it is a very 
significant fact in making provisions for them and predictions con- 
cerning them. 

Two other points remain to be noted. The effect of the length 
of the interval between the first and second tests for those tested 
in both instances by the Stanford Revision is shown to be as follows: 


INTERVAL NuMBER OF CASES AVERAGE DIFFERENCE 
9 months-1 year 4 7.0 points IQ 
1 year-1)4 years 21 4.1 points 1Q 
114 years-2 years 9 5.1 points IQ 
2 years-2)4 years 20 5.2 points IQ 
2% years-4 years 9 6.2 points IQ 
ae 59 Se 5.3 


It is evident that within the rather narrow time limits here 
involved, any effect of the length of interval is not revealed. The 
number of cases is, of course, too small to show any general tendency. 

The sex difference noted by Gordon! recently is not in evidence. 
The average difference for 34 boys and 25 girls is the same, viz., 5.3 
points. In the case of the boys, 19 differences were positive and 12 
negative. In the case of the boys 12 differences were positive and 
12 negative. In Gordon’s results ‘‘most of the losses were with the 
girls and most of the gains with.the boys.” 

In summary our results for 59 cases, with intervals from 1 to 4 
years, give a correlation of 0.91 between the first and second tests 
and an average difference of 5.3 points. The probable error of measure- 
ments is therefore about 3 points IQ. The significant point brought 
out by the study is the evident tendency with borderline cases, for 
intelligence quotients to decrease, a tendency most marked in the 


group whose initial IQ’s lie between 60 and 80. 


1Gordon, Kate: Some retests with the Stanford-Binet Scale. Journal of 
Educational Psychology, Vol. 13, No. 6, September, 1922. 
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NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


EDUCATION >a 


CONDUCTED BY LAURA ZIRBES'! 


1. The Status and Future of “Behavioristic’’ Psychology.—There is 
and, for some, has been a need for a thorough and scholarly study of the 
history and present status of ‘‘behaviorism” as a movement in 
psychology. It would be expected of anyone undertaking this task 
that nothing short of a penetrating analysis of the growth of many 
“schools” or “system”? not only in psychology, but in philosophy, 
biology and other related subjects, would be completed before emphatic 
pronouncements would be made. Roback? has exceeded expec- 
tations in point of courage if not in matters of thoroughness in his 
recent book. He has given Watson “the enfant terrible of behavior- 
ism” what he considers to be a well deserved spanking, and he has not 
spared the rod in other cases. But perhaps many readers will feel that 
Roback has been motivated by an impatience that led to the administer- 
ing of punishment before listening to the whole story of the offenders 
or without considering evidence that other interested parties might 
have contributed. He has treated behaviorism as a menace against 
which a sharp polemic attitude must be taken. Perhaps many who 
are, like the reviewer, unsympathetic with extreme behaviorism will 
nevertheless feel that such an attitude is unwise even if not altogether 
futile. 

Roback’s book is based mainly on a series of controversial articles 
and a few books, rather than upon an analysis of fundamental trends 
represented in the work of recognized leaders, past and present. For 
example, in treating the ‘‘antecedents of behaviorism’”’ no mention is 
made of Judd’s writings on motor attitudes, Thorndike’s discussions 
of the evolution of the mind and such an extensive theory as those of 
Washburn in Movement and Mental Imagery, are barely mentioned. In 
treating the ‘‘ Varieties of Behaviorism” and other topics, we are given 


-_- 














1 Unsigned reviews prepared by Laura Zirbes. 
? Roback, A. A.: “Behaviorism and Psychology.” Cambridge, University 
Bookstore, 1923, pp. 284. 
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a series of quotations, some of which may have been unwittingly made, 
plucked from various sources, and interspersed with comments after 
the fashion of the Literary Digest instead of a comprehensive view of 
movements within the science itself. 

Part II of the book is a direct critical attack upon some of the 
principles of behaviorism. The inadequacy of the Watsonian accounts 
of memory and other types of learning and activity are presented 
together with a less effective because less important attack upon the 
logic of particular behaviorist. A large portion of the book—all of 
Part III which is the longest, and a section of Part II—is devoted to 
discussions of the imcompatibility of behaviorism and philosophy, 
ethics, jurisprudence, medicine, religion, and the demands of life. We 
are told that “‘Watson’s substitute for thought is untenable in (the) 
eyes of (the) law;” that the “religious consciousness is not reducible 
to non-mental components,” etc. The behaviorist is not likely, 
however, to be frightened from his position by these or other practical 
difficulties in making himself understood. The book includes a bibliog- 
raphy of books and articles bearing on the “‘issue between behaviorism 
and psychology.” It includes an appendix on “Intelligence and 
Intellect”? and another on “‘ How is Psychology Defined?” that seem 
to have no significant relation to the main body of the book. 

Psychologists will find in the book many admirable passages, a 
useful list of references, and an interesting classification of types of Pre- 
Behaviorisms, Behaviorisms Proper, Psycho-Behaviorisms, and Nom- 
inal Behaviorisms. The elementary student will, of course, find such 
a book unintelligible. A. 1. G. 





2. The Effectiveness of Visual Instruction —We need scientific data 
to ascertain just how much the effectiveness of instruction is en- 
hanced by the use of certain visual aids. Such auxiliary materials and 
methods need to be compared and evaluated with reference to sound 
educational criteria. The following phases of this problem were 
studied in an investigation of the learning of seventh grade pupils:! (1) 
The effectiveness of informational moving pictures in combination 
with verbal instruction; (2) The value of simple drawings in creating 
composite visual images; (3) The value of diagrams in developing 


1 Weber, Joseph J.: “Comparative Effectiveness of Some Visual Aids in 


Seventh Grade Instruction.” Chicago, The Educational Screen, Inc., 1922, 
pp. 131. 
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relatively abstract concepts; (4) The comparative effectiveness of 
four different methods of presentation, viz.: Study of the printed 
page, oral instruction by the teacher, silent observation of a film, and 
observation of a film, accompanied by a lecture or explanatory remarks. 
The conclusions are based on the scores obtained by about 500 pupils 
on 3 kinds of tests. The differences in effectiveness are noticeable but 
so surprisingly small as to seem almost insignificant. An investigation 
into pupil preferences shows that pupils’ choices favor the use of films’ 
We wonder whether the effects of various methods of presentation do 
not vary much more with reference to permanence of impressions and 
whether an investigation should not consider that possibility. Thedata 
presented in this study show that visual aids in the form of pictures or 
diagrams are more necessary when the material is foreign to the pupil’s 
actual experience, or abstract in its nature. Further investigations are 
needed and some should be carried on with younger children. 

It is too bad that a study manifestly concerned with the effective- 
ness of visual aids should neglect to provide in its published form so 
many of the visual cues by means of which readers of such studies are 
saved time and trouble. 





3. The Biology and Psychology of Child Nurture.—This book is a 
plea for the more intelligent understanding of the interlocking relations 
of heredity and environment, and an application of such knowledge to 
the related problems of eugenics, prenatal care and child culture.’ It 
is written from the standpoint of a physician, one who has a vision of 
the tremendous social significance of the conservation of childhood. 





4. The Scientific Method and Education—This monograph? is 
largely devoted to an expository treatment of the methods of the scien- 
tist as distinct from the methods of the scholar. This is to furnish the 
background for training the student in the “scientific method of 
interrogating nature.’”’ One wonders to what extent such a recon- 
struction of the historical scientific method results in an overestimation 
of the réle played by inductive-deductive thinking as a conscious 





1 Chapin, Henry Dwight: “Heredity and Child Culture.”” New York, E. P. 
Dutton and Company, 1922, pp. viii + 219. 

? Sanford, Fernando: “‘How to Study, Illustrated Through Physics.” New 
York, The MacMillan Co., 1922, pp. vi + 56. 
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organizing agency, and underestimates the extent and significance of 
trial success and accidental discovery. Moreover the reader whose 
expectations have been raised by the attractive title cannot avoid 
being disappointed at the omission of any concrete approach to the 
classroom situation. Without minimizing the value of this contribu- 
tion, we can maintain that there is a real need at this time, not so 
much for a further definition and analysis of the scientific method, as 
for at least a preliminary experimental excursion into the identifica- 
tion and evaluation of scientific method as it can be expressed in the 
specific activities and materials of the classroom. 


J. G. KupDERNA. 





5. Investigations of Typewriting and Stenography.—In the first of 
these two monographs! stenographic ability is analyzed into its im- 
portant components or functions as a first step in the construction 
and standardization of scales for the measurement of achievement in 
shorthand. The technique of scale construction is evidence that the 
author is thoroughly conversant with the history and present status 
of educational measurement. The reasons for the selection of certain 
test elements and procedures, and the rejection of others are interest- 
ingly stated. The series of tests includes an ingenious and original 
test of reading ability in which comprehension is checked while rate 
is measured. There are two forms of a test for speed of writing and 
a 16-step scale for measuring the quality of shorthand penmanship, 
which does credit to its forbears. Each of the ten equivalent vocabu- 
lary tests consists of one hundred common words and fifty common 
phases, the selection of which was based on four well known vocabulary 
studies and a comprehensive count of phrases or word groups made 
by the author. This phrase study is significant, apart from its 
immediate purpose. There is also a scale, similar to the Ayres Spelling 
Scale, for measuring knowledge of shorthand word and phrase charac- 
ters or outlines. 

In the same series is a monograph which deals with the improve- 
ment of speed and accuracy in typewriting.? Four vocabulary 


1 Hoke, Elmer Rhodes, Ph.D.: ‘‘The Measurement of Achievement in Short- 
hand.” The Johns Hopkins University Studies in Education, No. 6. Baltimore, 
The Johns Hopkins Press, 1922, pp. vii + 118. 

2? Hoke, Roy Edward, Ph.D.: ‘‘The Improvement of Speed and Accuracy in 
Typewriting.’”’ The Johns Hopkins University Studies in Education, No. 7. 
Baltimore, The Johns Hopkins Press, 1922, pp. 42. 
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studies were used in determining the relative frequency with which 
the various letters and other characters on the keyboard are used. 
These data were used in connection with a study of errors in type- 
writing and close correlation between infrequency of use and frequency 
of error is revealed. Other causes of error ‘were also studied. The 
relative abilities of the eight fingers and the two hands were studied 
with interesting results. Finally the relative load assigned to each 
hand and to each finger was studied. The author concludes from the 
nature of the evidence that, inasmuch as the so-called standard or 
universal keyboards have been arranged with reference to no dis- 
coverable criteria whatsoever, greater speed and accuracy may be 
attained by a rearrangement of the keyboard based on the principles 
underlying the touch method, and by a redistribution of the loads 
assigned to the several fingers. 





6. On the Improvement of Examinations——Anent the matter of 
educational measurement and the unreliability of teachers’ marks 
let it be said that practical school people can learn much from the 
technique of standardized tests. This thesis is admirably set forth 
in a recent bulletin. Following an analytical critique and a similarly 
analytical defense of such examinations, numerous practical sugges- 
tions for improvement are presented and illustrated. Directions 
are given for constructing and scoring true-false examinations, recog- 
nition exercises and completion tests. 





7. Biology and the Public Press.~—In the belief that the careful 
determination of the curriculum of secondary schools necessitates num- 
erous quantitative studies, the joint authors of this monograph have 
proceeded with one such investigation in the field of biology. They 
have tried to ascertain the amount of biological information supplied 
to the public through the columns of a representative selection of 
American newspapers. In some 14,000 newspaper pages over 3000 
articles were found to deal with some phase of biological material, 
About three-tenths of the 25,000 running inches of biological matter 
pertained to health. Articles on animals rank next in frequency. 





1 Monroe, Walter S.: ‘Written Examinations and Their Improvement.”’ 
University of Illinois Bulletin No. 9. Urbana, University of Illinois, 1922, pp. 71. 

* Finley, Charles W. and Caldwell, Otis W.: “Biology and the Public Press.”’ 
New York, The Lincoln School of Teachers College, 1923, pp. 151. 
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These, with articles on plants and food represent over 90 per cent of 
the total space. From the nature of the evidence the public has out- 
grown fictitious biology but evolution seems either to have lost its 
news value or has perhaps been deleted in consideration of those who 
hold differing opinions, fictitious or otherwise. It is surprising to 
find in the press so slight a reflection of the organized attempts to 
prohibit the teaching of evolution. 

Over a hundred typical clippings are reprinted in this publication 
and the authors suggest that teachers may find these, or similar articles 
useful as points of contact or departure in instruction. 





TESTS AND PRACTICE MATERIALS 


1. Pintner-Cunningham Primary Mental Test.1—Thisis a non-verbal 
group test for use in the classification of kindergarten and primary 
pupils. The test consists of a 16-page booklet of pictures to be marked 
by the pupil according to standardized oral directions. The coefficient 
of correlation between two trials of the test was 0.88 with one group 
and 0.93 with another. The probable error of the score has been 
found to be two points. The correlations with other mental tests 
and criteria are not so high. By means of a table based on 856 cases, 
mental age and IQ may be derived. ‘‘Scale Charts”’ of the “ Percentile 
Graph” are provided and their use is discussed in the manual of 
directions. Grade norms for mid-year are also given. The total 
time necessary for giving the test is not mentioned in the manual. 
The cost per pupil is approximately 6 cents. 





2. Cole-Vincent Group Intelligence Test for School Entrants.2—This 
is a non-verbal test for 5-, 6- and 7-year-olds. The standardization is “ 
still in process but a recent bulletin based on 751 cases shows the test 
to be exceedingly discriminative and reliable over its whole range. 
This cannot be said of some of the primary group tests with which it 
has been compared. In another report from the field this test showed 
higher correlations with Binet mental age scores than those obtained 
when the Detroit or Dearborn tests were similarly compared with the 
Binet. 


1 Pintner, Rudolf and Cunningham, Bess V.: “ Pintner-Cunningham Primary 
Mental Test.’”” The World Book Co., Yonkers on Hudson, 1923. 

2 Cole, L. W. and Vincent: “Cole-Vincent Group Intelligence Test for School 
Entrants.” Bureau of Measurements, State Normal School, Emporia, Kansas. 
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